flowchart LR A[Simplest way to combine variables \n Strong theoretical results] --> B(Robustness \n Speed \n Flexibility)
Lecture seven
\[ x_1u_1 + x_2u_2 + \dots + x_pu_p \]
flowchart LR A[Simplest way to combine variables \n Strong theoretical results] --> B(Robustness \n Speed \n Flexibility)
Theorem 1 (Maximisation of quadratic forms for points on the unit sphere) Let \(\mathbf{B}:p\times p\) be a positive semi-definite matrix with the \(i\)-th largest eigenvalue \(\lambda_i\) and associated eigenvector \(\mathbf{e}_i\). Then we have that
\[ \max_{\mathbf{x}\neq \mathbf{0}} \frac{\mathbf{x}'\mathbf{B}\mathbf{x}}{\mathbf{x}'\mathbf{x}}=\lambda_1 \; (\mathrm{attained}\; \mathrm{when} \; \mathbf{x}=\mathbf{e}_1) \]
and that
\[ \max_{\mathbf{x}\neq \mathbf{0},\mathbf{x}\perp \mathbf{e}_1, \mathbf{e}_2, \ldots, \mathbf{e}_{i-1}} \frac{\mathbf{x}'\mathbf{B}\mathbf{x}}{\mathbf{x}'\mathbf{x}}=\lambda_i \; (\mathrm{attained}\; \mathrm{when} \; \mathbf{x}=\mathbf{e}_{i}) \]
for \(i\in\{2,3, \ldots, p\}\).
Proof. Let \(\mathbf{P}:p\times p\) be the orthogonal matrix whose \(i\)-th column is the \(i\)-th eigenvector and \(\Lambda\) be the diagonal matrix with ordered eigenvalues along the diagonal. Let \(\mathbf{B}^{1/2}=\mathbf{P}\mathbf{\Lambda}^{1/2}\mathbf{P}'\) and \(\mathbf{y}=\mathbf{P}'\mathbf{x}\).
First, we show that the quadratic form can never be larger than \(\lambda_1\):
\[ \begin{align*} \frac{\mathbf{x}'\mathbf{B}\mathbf{x}}{\mathbf{x}'\mathbf{x}} &= \frac{\mathbf{x}'\mathbf{B}^{1/2}\mathbf{B}^{1/2}\mathbf{x}}{\mathbf{x}'\mathbf{P}\mathbf{P}'\mathbf{x}} = \frac{\mathbf{x}'\mathbf{P}\mathbf{\Lambda}^{1/2}\mathbf{P}'\mathbf{P}\mathbf{\Lambda}^{1/2}\mathbf{P}'\mathbf{x}}{\mathbf{y}'\mathbf{y}} = \frac{\mathbf{y}'\mathbf{\Lambda}\mathbf{y}}{\mathbf{y}'\mathbf{y}} \\ &= \frac{\sum_{i=1}^p\lambda_iy_i^2}{\sum_{i=1}^py_i^2} \leq \lambda_1 \frac{\sum_{i=1}^py_i^2}{\sum_{i=1}^py_i^2} = \lambda_1 \end{align*} \]
Now we show that this is actually attained for \(\mathbf{x}=\mathbf{e}_1\). Since eigenvectors are by convention length 1, we consider \(\mathbf{e}_1'\mathbf{B}\mathbf{e}_1\).
First, let \(\mathbf{c}_i\) be the unit vector with a 1 in the \(i\)-th position (and 0’s everywhere else).
Expanding \(\mathbf{B}\) by the eigen decomposition, we have that
\[ \mathbf{e}_1'\mathbf{B}\mathbf{e}_1=\mathbf{e}_1'\mathbf{P}\mathbf{\Lambda}\mathbf{P}'\mathbf{e}_1. \]
Since \(B\) is symmetric, the eigenvectors can be chosen orthogonal. We do so, which implies that \(\mathbf{e}_i'P=\mathbf{c}_i'\), where \(\mathbf{c}_i\) is the unit vector with a 1 in the \(i\)-th position (and 0 everwhere else). Consequently,
\[ \mathbf{e}_i'\mathbf{P}\mathbf{\Lambda}\mathbf{P}'\mathbf{e}_i = \mathbf{c}_i'\mathbf{\Lambda}\mathbf{c}_i=\lambda_i, \]
and so \(\mathbf{e}_1'\mathbf{P}\mathbf{\Lambda}\mathbf{P}'\mathbf{e}_1=\lambda_1\).
Now, we consider the case where \(\mathbf{x}\) is orthogonal to \(\mathbf{e}_1, \mathbf{e}_2, \ldots, \mathbf{e}_{i-1}\).
Each component in the vector \(\mathbf{y}\) is the dot product of \(x\) and an eigenvector \(\mathbf{e}_i\). Since we choose \(x\) orthogonal to the first \(i-1\) eigenvectors, the first \(i-1\) entries of \(\mathbf{y}\) are zero.
Returning to considering the quadratic form, we have that
\[ \begin{align} \frac{\mathbf{x}'\mathbf{B}\mathbf{x}}{\mathbf{x}'\mathbf{x}} &= \frac{\sum_{j=1}^p\lambda_jy_j^2}{\sum_{j=1}^py_j^2} \\ &= \frac{\sum_{j=i}^p\lambda_jy_j^2}{\sum_{j=i}^py_j^2} \leq \lambda_i \frac{\sum_{j=i}^py_j^2}{\sum_{j=i}^py_j^2} = \lambda_i \end{align} \]
Using the same argument as before, we can show that this is actually attained for \(\mathbf{x}=\mathbf{e}_i\).
Theorem 2 (Identifying successive variance-maximising and orthogonal linear combinations) Let \(\mathbf{\Sigma}:p\times p\) be the variance-covariance matrix with the \(i\)-th largest eigenvalue \(\lambda_i\) and associated eigenvector \(\mathbf{e}_i\). Then the \(i\)-th principal component (defined as before) is given by
\[ Y_i = \mathbf{e}_i'\mathbf{X}, \]
implying that
\[\begin{align*} \mathrm{Var}[Y_i] &= \lambda_i \;\forall \; i \in \{1,2, \ldots, p\}, \; \mathrm{and} \\ \mathrm{Cov}[Y_i, Y_j] &= 0 \; \mathrm{for} \; i\neq j. \end{align*}\]
Proof. From Theorem 1, we know that
\[ \max_{\mathbf{x}} \frac{\mathbf{x}'\mathbf{\Sigma}\mathbf{x}}{\mathbf{x}'\mathbf{x}} = \lambda_1, \]
which we achieve by setting \(\mathbf{x}=\mathbf{e}_1\).
Since \(||\mathbf{e}_1||=1\), \(\frac{\mathbf{e}_1'\mathbf{\Sigma}\mathbf{e}_1}{\mathbf{e}_1'\mathbf{e}_1}=\mathbf{e}_1'\mathbf{\Sigma}\mathbf{e}_1\), which is equal to \(\mathrm{Var}[\mathbf{a}'\mathrm{X}]=\mathrm{Var}[Y_1]\) if we set \(\mathbf{a}=\mathbf{e}_1\). This implies that \(\mathbf{a}=\mathbf{e}_1\) maximises the variance, which is \(\lambda_1\).
Analagous reasoning shows that \(\mathrm{Var}[\mathbf{a}_i'\mathbf{X}]\) has a maximum value \(\lambda_i\) for \(i\in\{2,3,\ldots,p\}\) that is attained when we set \(\mathbf{a}_i=\mathbf{e}_i\), under the restriction \(\mathbf{a}_i\) to be orthogonal to \(\mathbf{a}_1, \mathbf{a}_2, \ldots, \mathbf{a}_{i-1}\).
For \(i\neq j\), \(\mathrm{Cov}[\mathbf{e}_i'\mathbf{X},\mathbf{e}_j'\mathbf{X}] = \mathbf{e}_i'\mathbf{\Sigma}\mathbf{e}_j = 0\).
From Theorem 5a, we know that the best rank \(k\) approximation of \(\mathbf{X}\) is given by \(\hat{\mathbf{X}}=\mathbf{U}\mathbf{D}\mathbf{J}_kV'\).
We wish to show that the matrix approximation \(\hat{X}\) constructed from the first \(k\) principal components is the same. Letting \(Y:n\times p\) be the matrix of scores and \(\mathbf{P}:p\times p\) be the matrix of loadings, we have that
\[ \mathbf{Y} =\mathbf{X}\mathbf{P} \implies \mathbf{Y}\mathbf{P}^{-1} =\mathbf{X}. \]
We can make us of only the first \(k\) principal components by inserting the matrix \(\mathbf{J}_k\):
\[ \mathbf{Y}\mathbf{J}_k\mathbf{P}^{-1} =\tilde{\mathbf{X}}. \]
By definition, \(\mathbf{P}=\mathbf{V}\) as \(\mathbf{P}\) and \(\mathbf{V}\) both have the ordered eigenvectors of \(\mathbf{X}'\mathbf{X}\) along the columns. Since the eigenvectors ae orthogonal, \(\mathbf{P}^{-1}=\mathbf{V}'\). We can therefore write
\[ \mathbf{Y}\mathbf{J}_k\mathbf{V}' =\tilde{\mathbf{X}}. \]
It remains to show that \(\mathbf{Y}=\mathbf{U}\mathbf{D}\). Since \[ \mathbf{Y}=\mathbf{X}\mathbf{P}=\mathbf{X}\mathbf{V}, \]
we have that
\[ \mathbf{Y}=\mathbf{U}\mathbf{D}\mathbf{V}'\mathbf{V}=\mathbf{U}\mathbf{D} \]
by orthogonality of \(\mathbf{V}\).